In this project i will perform the bulk analysis in 3 different tissue, brain, liver and lung, to extract differentially expressed genes. I will perform this analysis without excluding rRNA, mRNA, pseudogenes and non canonical chromosomes. The aim of this work is to understand if the methods seen during lesson are robust enough to be reliable in presence of additional sources of variation. I also want to prove that this workflow is able to find meaningful differentially expressed genes between the three samples
The tissue assigned to me are the following: brain, liver and lung. The first step is to load the corresponding dataset:
rse_brain <- readRDS("rse_brain.RDS")
rse_liver <- readRDS("rse_liver.RDS")
rse_lung <- readRDS("rse_lung.RDS")
Then i need to take the transformed values for count because they are stored as overall read coverage over exons:
assays(rse_brain)$counts <- transform_counts(rse_brain)
assays(rse_liver)$counts <- transform_counts(rse_liver)
assays(rse_lung)$counts <- transform_counts(rse_lung)
Each replicate need to be checked for some quality parameter, before performing any type of analysis:
RIN > 6
% of mapped reads > 85%
% of rRNA reads → never higher then 10%
I checked each parameters for each replicates:
#Brain
colData(rse_brain)[101,]$'recount_qc.star.uniquely_mapped_reads_%_both'
## [1] 88.6
colData(rse_brain)[101,]$gtex.smrin
## [1] 8.8
colData(rse_brain)[101,]$gtex.smrrnart
## [1] 0.0327779
#Liver
colData(rse_liver)[103,]$'recount_qc.star.uniquely_mapped_reads_%_both'
## [1] 92
colData(rse_liver)[103,]$gtex.smrin
## [1] 6.6
colData(rse_liver)[103,]$gtex.smrrnart
## [1] 0.0158024
#Lung
colData(rse_lung)[104,]$'recount_qc.star.uniquely_mapped_reads_%_both'
## [1] 92
colData(rse_lung)[104,]$gtex.smrin
## [1] 6.7
colData(rse_lung)[104,]$gtex.smrrnart
## [1] 0.00239451
If one replicates wasn’t good, i checked the next one. For example:
colData(rse_lung)[100,]$'recount_qc.star.uniquely_mapped_reads_%_both'
## [1] 88.2
colData(rse_lung)[100,]$gtex.smrin
## [1] 5.5
colData(rse_lung)[100,]$gtex.smrrnart
## [1] 0.00852349
colData(rse_liver)[101,]$'recount_qc.star.uniquely_mapped_reads_%_both'
## [1] 90.5
colData(rse_liver)[101,]$gtex.smrin
## [1] 6.2
colData(rse_liver)[101,]$gtex.smrrnart
## [1] 0.0260051
Now I create new RSE object with the sample that have passed the quality check;
rse_brain_selected <- rse_brain[,c(98,99,100)]
rse_liver_selected <- rse_liver[,c(98,100,101)]
rse_lung_selected <- rse_lung[,c(98,101,102)]
Now is necessary to extract the count data from each tissue:
Assay is an object that contains the data about gene expression for a sample or tissue
Counts → contains an analysis of the count data and that the “counts” object contains the actual count data for the genes in individual cells or samples.
I filter the RSEs in this way:
counts_brain_selected <- assays(rse_brain_selected)$counts
counts_liver_selected <- assays(rse_liver_selected)$counts
counts_lung_selected <- assays(rse_lung_selected)$counts
Now it is possible to create a count table containing each sample, using “DGEList”:
final_count_table <- cbind(counts_brain_selected, counts_liver_selected, counts_lung_selected)
colnames(final_count_table) <- c("Brain98", "Brain99", "Brain100", "Liver98", "Liver100", "Liver101", "Lung98", "Lung101", "Lung102")
rownames(final_count_table) <- rowData(rse_brain_selected)$gene_name
size <- colSums(final_count_table)
y <- DGEList(counts=final_count_table)
group <- as.factor(c("Brain", "Brain", "Brain", "Liver", "Liver", "Liver", "Lung", "Lung", "Lung"))
y$samples$group <- group
I also add other important quality information:
y$samples$rin <- as.factor(c(colData(rse_brain_selected)$gtex.smrin,colData(rse_liver_selected)$gtex.smrin, colData(rse_lung_selected)$gtex.smrin))
y$samples$slice <- as.factor(c(colData(rse_brain_selected)$gtex.smtsd,colData(rse_liver_selected)$gtex.smtsd,colData(rse_lung_selected)$gtex.smtsd))
y$samples$sex <- as.factor(c(colData(rse_brain_selected)$gtex.sex, colData(rse_liver_selected)$gtex.sex, colData(rse_lung_selected)$gtex.sex))
y$samples$age <- as.factor(c(colData(rse_brain_selected)$gtex.age, colData(rse_liver_selected)$gtex.age, colData(rse_lung_selected)$gtex.age))
y$samples$rRNA <- as.factor(c(colData(rse_brain_selected)$gtex.smrrnart,colData(rse_liver_selected)$gtex.smrrnart, colData(rse_lung_selected)$gtex.smrrnart))
y$samples$mapped <- as.factor(c(colData(rse_brain_selected)$"recount_qc.star.uniquely_mapped_reads_%_both",colData(rse_liver_selected)$"recount_qc.star.uniquely_mapped_reads_%_both", colData(rse_lung_selected)$"recount_qc.star.uniquely_mapped_reads_%_both"))
y$samples$chrm <- as.factor(c(colData(rse_brain_selected)$"recount_qc.aligned_reads%.chrm", colData(rse_liver_selected)$"recount_qc.aligned_reads%.chrm", colData(rse_lung_selected)$"recount_qc.aligned_reads%.chrm"))
Now I can check the final count table:
y
## An object of class "DGEList"
## $counts
## Brain98 Brain99 Brain100 Liver98 Liver100 Liver101 Lung98 Lung101
## SNX18P15 0 0 0 0 0 0 0 0
## SNX18P16 0 0 0 0 0 0 0 0
## ANKRD20A12P 0 0 0 0 0 0 0 0
## ANKRD20A15P 0 0 0 0 0 0 0 0
## LOC105379272 0 0 0 0 0 0 0 0
## Lung102
## SNX18P15 0
## SNX18P16 0
## ANKRD20A12P 0
## ANKRD20A15P 0
## LOC105379272 0
## 54037 more rows ...
##
## $samples
## group lib.size norm.factors rin slice sex
## Brain98 Brain 32205953 1 7.8 Brain - Putamen (basal ganglia) 2
## Brain99 Brain 30288918 1 7.1 Brain - Cerebellum 1
## Brain100 Brain 27710552 1 6.9 Brain - Hypothalamus 1
## Liver98 Liver 28097679 1 7 Liver 2
## Liver100 Liver 27203185 1 6.4 Liver 1
## Liver101 Liver 28375615 1 6.2 Liver 2
## Lung98 Lung 32932931 1 6.1 Lung 1
## Lung101 Lung 33983555 1 7.3 Lung 2
## Lung102 Lung 30548995 1 6.9 Lung 1
## age rRNA mapped chrm
## Brain98 60-69 0.0333394 89.6 14.41
## Brain99 60-69 0.0106194 92 7.53
## Brain100 60-69 0.0438324 88.7 21.35
## Liver98 60-69 0.0143595 88.7 16.21
## Liver100 40-49 0.0148645 88.3 22.4
## Liver101 50-59 0.0260051 90.5 21.3
## Lung98 60-69 0.00328285 92.6 2.37
## Lung101 60-69 0.00209038 85.8 2.14
## Lung102 60-69 0.00466061 91.2 4.04
Genes that have very low counts across all the libraries should be removed prior to downstream analysis. This is justified on both biological and statistical grounds. From biological point of view, a gene must be expressed at some minimal level before it is likely to be translated into a protein or to be considered biologically important. From a statistical point of view, genes with consistently low counts are very unlikely be assessed as significantly DE because low counts do not provide enough statistical evidence for a reliable judgement to be made. Such genes can therefore be removed from the analysis without any loss of information. - From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. Yunshun Chen,1,2 Aaron T. L. Lun,3 and Gordon K. Smytha,1,4
First, I look at the number of low expressed genes. Then keep.exprs function removes all genes with low or equal to 0 expression:
table(rowSums(y$counts==0)==9)
##
## FALSE TRUE
## 39732 14310
keep.exprs <- filterByExpr(y, group=group)
y <- y[keep.exprs, keep.lib.sizes=FALSE]
LogCPM is calculated dividing the number of reads of a gene for the total of the reads in the sample, then multiply for a million and then apply a log2 transformation. This normalize the data about expression based on the dimension of the sample, allowing for a more accurate comparison between samples of different sizes.
logcpm_before <- cpm(y, log=TRUE)
Creating a boxplot of the LogCPM
brain <- c('Brain98', 'Brain99', 'Brain100')
liver <- c('Liver98', 'Liver100', 'Liver101')
lung <- c('Lung98', 'Lung101', 'Lung102')
myColors <- ifelse(colnames(logcpm_before) %in% brain , '#99CCFF' , ifelse(colnames(logcpm_before) %in% liver, '#0099FF' ,'#003399' ) )
boxplot(logcpm_before,notch=T,xlab='Replicates',ylab='LogCPM', main='LogCPM before TMM normalization',col=myColors, varwidth=T)
Check values of the median:
for (i in 1:9){
print(median(logcpm_before[,i]))
}
## [1] 3.272081
## [1] 3.264139
## [1] 3.507099
## [1] 2.337198
## [1] 2.151652
## [1] 2.45868
## [1] 3.370884
## [1] 3.095907
## [1] 2.932823
TMM normalization is a simple and effective method for estimating relative RNA production levels from RNA-seq data. The TMM method estimates scale factors between samples that can be incorporated into currently used statistical methods for DE analysis. - A scaling normalization method for differential expression analysis of RNA-seq data. Mark D Robinson & Alicia OshlackÂ
The next step is to apply the TMM via calcNormFactors
function in edgeR.
y <- calcNormFactors(y, method = "TMM")
logcpm_after <- cpm(y, log=TRUE)
Now I can visualize and compare the resulting boxplot after the TMM normalization:
#Same as before
boxplot(logcpm_after,notch=T,xlab='Replicates',ylab='LogCPM', main='LogCPM after TMM normalization',col=myColors, varwidth=T)
Check new value for median
for (i in 1:9){
print(median(logcpm_after[,i]))
}
## [1] 3.007287
## [1] 2.902228
## [1] 3.113191
## [1] 2.835462
## [1] 2.79307
## [1] 2.897711
## [1] 3.058075
## [1] 2.92747
## [1] 2.8449
The first step is to design the linear model. From a logical point of view the intercept is not needed here:
design <- model.matrix(~0+group, data=y$samples)
colnames(design) <- levels(y$samples$group)
design
## Brain Liver Lung
## Brain98 1 0 0
## Brain99 1 0 0
## Brain100 1 0 0
## Liver98 0 1 0
## Liver100 0 1 0
## Liver101 0 1 0
## Lung98 0 0 1
## Lung101 0 0 1
## Lung102 0 0 1
## attr(,"assign")
## [1] 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
The aim of a MDS plot is to determine the major source of variation in the data. If data are quite good, I expect that the greatest sources of variation in the data are the different three tissue.
logcpm <- cpm(y, log=TRUE)
plotMDS(logcpm, labels=group, main = 'Multidimensional scaling plot: gene expression profiles - group',)
In the case of brain one sample is little farther from the other two. In this case is better to check other quality information, aiming to understand which may be the source of variability.
plotMDS(logcpm_after, labels=y$samples$rRNA, main = 'Multidimensional scaling plot of distances between gene expression profiles - rRNA% label')
plotMDS(logcpm_after, labels=y$samples$chrm, main = 'Multidimensional scaling plot of distances between gene expression profiles - chrm% label')
plotMDS(logcpm_after, labels=y$samples$slice, main = 'Multidimensional scaling plot of distances between gene expression profiles - slice label')
plotMDS(logcpm_after, labels=y$samples$age, main = 'Multidimensional scaling plot of distances between gene expression profiles - age label')
plotMDS(logcpm_after, labels=y$samples$sex, main = 'Multidimensional scaling plot of distances
between gene expression profiles - sex label')
The tissues cluster very well
Biological CV (BCV) is the coefficient of variation with which the (unknown) true abundance of the gene varies between replicate RNA samples. BCV is therefore likely to be the dominant source of uncertainty for high-count genes, so reliable estimation of BCV is crucial for realistic assessment of differential expression in RNA-Seq experiments. If the abundance of each gene varies between replicate RNA samples in such a way that the genewise standard deviations are proportional to the genewise means, a commonly occurring property of measurements on physical quantities, then it is reasonable to suppose that BCV is approximately constant across genes. - Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Davis J. McCarthy, Yunshun Chen and Gordon K. Smyth
Single estimates for genes are not reliable, better to use the estimate trend to see if they are close to the trend itself. It corrects the single estimates by shrinking (reduction in the effects of sampling variation) them. The next step is computing BCV → correction is computed examining the trend curve, showing the relationship between mean and variance.
y <- estimateDisp(y, design)
plotBCV(y)
The “Common” line is little above 0.5, even if the analysis is considering different sample with different donor for age, sex and slice (brain case), and this sort of things that influence the biological variability.
The next step is to fit a quasi-likelihood negative binomial generalized log-linear model to count data. Conduct gene-wise statistical tests for a given coefficient or contrast.
fit <- glmQLFit(y, design)
fit
## An object of class "DGEGLM"
## $coefficients
## Brain Liver Lung
## MIR6859-1 -14.64538 -14.607522 -14.692751
## WASH7P -10.12656 -10.369724 -10.262623
## SEPT14P18 -14.47832 -13.576960 -14.195668
## CICP27 -14.80306 -14.544675 -14.811911
## LOC729737 -11.37611 -9.939963 -9.555298
## 23854 more rows ...
##
## $fitted.values
## Brain98 Brain99 Brain100 Liver98 Liver100 Liver101
## MIR6859-1 16.68522 16.79039 15.69678 8.885085 7.779617 9.357301
## WASH7P 1544.39713 1554.13253 1452.90651 620.702803 543.475944 653.691275
## SEPT14P18 19.74676 19.87124 18.57696 25.043248 21.927407 26.374221
## CICP27 14.22889 14.31858 13.38596 9.466053 8.288302 9.969145
## LOC729737 442.56881 445.35863 416.35088 953.994217 835.299769 1004.696116
## Lung98 Lung101 Lung102
## MIR6859-1 16.84706 15.72651 13.36360
## WASH7P 1427.64729 1332.69025 1132.45311
## SEPT14P18 27.79918 25.95018 22.05115
## CICP27 14.93647 13.94300 11.84806
## LOC729737 2896.23243 2703.59546 2297.37937
## 23854 more rows ...
##
## $deviance
## MIR6859-1 WASH7P SEPT14P18 CICP27 LOC729737
## 2.3315482 0.8392765 3.8661459 5.1813846 12.7491286
## 23854 more elements ...
##
## $method
## [1] "oneway"
##
## $counts
## Brain98 Brain99 Brain100 Liver98 Liver100 Liver101 Lung98 Lung101
## MIR6859-1 12 18 19 16 6 4 17 16
## WASH7P 1245 1943 1371 567 532 724 1387 1497
## SEPT14P18 28 9 21 30 28 14 13 26
## CICP27 6 23 13 13 12 2 15 16
## LOC729737 380 638 295 533 1651 468 2142 4334
## Lung102
## MIR6859-1 13
## WASH7P 1025
## SEPT14P18 34
## CICP27 10
## LOC729737 1510
## 23854 more rows ...
##
## $unshrunk.coefficients
## Brain Liver Lung
## MIR6859-1 -14.65453 -14.616329 -14.702344
## WASH7P -10.12666 -10.369851 -10.262737
## SEPT14P18 -14.48606 -13.580099 -14.201513
## CICP27 -14.81378 -14.552991 -14.822714
## LOC729737 -11.37646 -9.940046 -9.555354
## 23854 more rows ...
##
## $df.residual
## [1] 6 6 6 6 6
## 23854 more elements ...
##
## $design
## Brain Liver Lung
## Brain98 1 0 0
## Brain99 1 0 0
## Brain100 1 0 0
## Liver98 0 1 0
## Liver100 0 1 0
## Liver101 0 1 0
## Lung98 0 0 1
## Lung101 0 0 1
## Lung102 0 0 1
## attr(,"assign")
## [1] 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
##
##
## $offset
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 17.46905 17.47534 17.40799 16.8007 16.66784 16.85249 17.52652 17.45769
## [,9]
## [1,] 17.29488
## attr(,"class")
## [1] "CompressedMatrix"
## attr(,"Dims")
## [1] 5 9
## attr(,"repeat.row")
## [1] TRUE
## attr(,"repeat.col")
## [1] FALSE
## 23854 more rows ...
##
## $dispersion
## [1] 0.3926254 0.1708760 0.3996547 0.3919609 0.1639608
## 23854 more elements ...
##
## $prior.count
## [1] 0.125
##
## $AveLogCPM
## [1] -1.0183123 5.1493928 -0.2049942 -1.1096182 5.4448313
## 23854 more elements ...
##
## $df.residual.zeros
## [1] 6 6 6 6 6
## 23854 more elements ...
##
## $df.prior
## [1] 3.942272
##
## $var.post
## MIR6859-1 WASH7P SEPT14P18 CICP27 LOC729737
## 0.5313171 0.2857667 0.6768771 0.8188397 1.4836359
## 23854 more elements ...
##
## $var.prior
## MIR6859-1 WASH7P SEPT14P18 CICP27 LOC729737
## 0.7485407 0.5078020 0.7263704 0.7507707 0.5077233
## 23854 more elements ...
##
## $samples
## group lib.size norm.factors rin slice sex
## Brain98 Brain 32103622 1.2027051 7.8 Brain - Putamen (basal ganglia) 2
## Brain99 Brain 30185833 1.2871794 7.1 Brain - Cerebellum 1
## Brain100 Brain 27603132 1.3159321 6.9 Brain - Hypothalamus 1
## Liver98 Liver 28074401 0.7049238 7 Liver 2
## Liver100 Liver 27188143 0.6373379 6.4 Liver 1
## Liver101 Liver 28357398 0.7349796 6.2 Liver 2
## Lung98 Lung 32883508 1.2436342 6.1 Lung 1
## Lung101 Lung 33948475 1.1244984 7.3 Lung 2
## Lung102 Lung 30516905 1.0629911 6.9 Lung 1
## age rRNA mapped chrm
## Brain98 60-69 0.0333394 89.6 14.41
## Brain99 60-69 0.0106194 92 7.53
## Brain100 60-69 0.0438324 88.7 21.35
## Liver98 60-69 0.0143595 88.7 16.21
## Liver100 40-49 0.0148645 88.3 22.4
## Liver101 50-59 0.0260051 90.5 21.3
## Lung98 60-69 0.00328285 92.6 2.37
## Lung101 60-69 0.00209038 85.8 2.14
## Lung102 60-69 0.00466061 91.2 4.04
The next step is to design the contrasts, we choose what we want to compare, by specifying the corresponding column:
Brain vs Liver
Brain vs Lung
Liver vs Lung
contrast is numeric vector or matrix specifying one or
more contrasts of the linear model coefficients to be tested equal to
zero. The order in the design table is brain - lung
qlfBrainLiver <- glmQLFTest(fit, contrast=c(1,-1,0))
qlfBrainLung <- glmQLFTest(fit, contrast=c(1,0,-1))
qlfLiverLung <- glmQLFTest(fit, contrast=c(0,1,-1))
topTags extracts the top DE tags in a data frame for a
given pair of groups, ranked by p-value or absolute log-fold change:
resultsBrainLiver = topTags(qlfBrainLiver, n = 10000000, adjust.method = "BH", sort.by = "PValue", p.value = 1)
resultBrainLung= topTags(qlfBrainLung, n = 10000000, adjust.method = "BH", sort.by = "PValue", p.value = 1)
resultsLiverLung = topTags(qlfLiverLung , n = 10000000, adjust.method = "BH", sort.by = "PValue", p.value = 1)
And also take a look to the numbers of top, down and not signed DE genes in each comparison:
summary(decideTests(qlfBrainLiver, p.value=0.01, lfc=1))
## 1*Brain -1*Liver
## Down 3906
## NotSig 16307
## Up 3646
summary(decideTests(qlfBrainLung, p.value=0.01, lfc=1))
## 1*Brain -1*Lung
## Down 3169
## NotSig 17954
## Up 2736
summary(decideTests(qlfLiverLung, p.value=0.01, lfc=1))
## 1*Liver -1*Lung
## Down 2411
## NotSig 18935
## Up 2513
Now it’s possible to compare the up regulated gene in each tissue with respect to the other two.
To do this I intersect the two table containing the two comparison performed for one tissue vs the other two. I set some criterion to perform the analysis in the correct way:
FDR (false discovery rate) < 0.01 → control the false discovery rate at a low level, allowing only a 1% chance of false positive results
logCPM > 0 → to prioritize genes that have a minimum level of expression across the samples analyzed. Genes with very low expression levels may not provide reliable or meaningful information in the context of the study.
LogFC (fold change) >1 or <-1 → for selecting differentially expressed genes implies that the genes must exhibit a fold change of at least two times (in linear terms) between the compared conditions. This criterion is often used to identify genes with a substantial expression variation between conditions and to reduce the inclusion of genes with small expression differences. A positive logFC indicates increased expression in the first tissue, while a negative logFC indicates increased expression in the second one
I also delete all the genes that are not useful for my analysis:
LOC: since they are those for which the offical gene symbol is not available
LINC: Long Intergenic Non-Protein Coding
MIR: MicroRNA
SNORD: Small nucleolar RNA
RPL: corresponding to ribosomal proteins
brain_1 <- rownames(resultsBrainLiver$table %>% filter(logFC > 1 & logCPM > 0 & FDR < 0.01))
brain_2 <- rownames(resultBrainLung$table %>% filter(logFC > 1 & logCPM > 0 & FDR < 0.01))
brain_total <- intersect(brain_1, brain_2)
table(startsWith(brain_total, "RPL"))
##
## FALSE
## 1717
maskBrain <- startsWith(brain_total, "LOC") | startsWith(brain_total,"MIR") | startsWith(brain_total, "LINC") | startsWith(brain_total, "SNORD")
brain_total <- brain_total[!maskBrain]
head(brain_total)
## [1] "NSG1" "BCAN" "NYAP1" "DNAJC6" "MYT1L" "CELF4"
liver_1 <- rownames(resultsBrainLiver$table %>% filter(logFC < -1 & logCPM > 0 & FDR < 0.01))
liver_2 <- rownames(resultsLiverLung$table %>% filter(logFC > 1 & logCPM > 0 & FDR < 0.01))
liver_total <- intersect(liver_1, liver_2)
table(startsWith(liver_total, "RPL"))
##
## FALSE
## 1636
maskBrain <- startsWith(liver_total, "LOC") | startsWith(liver_total,"MIR") | startsWith(liver_total, "LINC") | startsWith(liver_total, "SNORD")
liver_total <- liver_total[!maskBrain]
head(liver_total)
## [1] "PRAP1" "C3P1" "PON1" "CPN2" "AKR1C4" "CFHR1"
lung_1 <- rownames(resultBrainLung$table %>% filter(logFC < -1 & logCPM > 0 & FDR < 0.01))
lung_2 <- rownames(resultsLiverLung$table %>% filter(logFC < -1 & logCPM > 0 & FDR < 0.01))
lung_total <- intersect(lung_1,lung_2)
table(startsWith(lung_total, "RPL"))
##
## FALSE
## 1006
maskLung <- startsWith(lung_total, "LOC") | startsWith(lung_total,"MIR") | startsWith(lung_total, "LINC") | startsWith(lung_total, "SNORD")
lung_total <- lung_total[!maskLung]
head(lung_total)
## [1] "TCF21" "IDO1" "TBX2" "SLC11A1" "FENDRR" "ITGA1"
I select one gene differentially expressed in one tissue against the other two and check its transcript in UCSC Browser. In this case i choose NSG1.
This is the screen of alternative transcript:

It is possible to notice some event of alternative splicing:
Alternative in the TSS (transcription start site)
Alternative TTS (transcription termination site)
Cassette exon
which(rowData(rse_brain)$gene_name == "NSG1") #38639
## [1] 38639
This gene is more expressed in brain with respect to liver and lung where it seems to be not expressed. It is possible to double check this with an appropriate statistical test:
assays(rse_brain)$TPM <- recount::getTPM(rse_brain)
assays(rse_lung)$TPM <- recount::getTPM(rse_lung)
assays(rse_liver)$TPM <- recount::getTPM(rse_liver)
df_b=data.frame(TPM=assays(rse_brain)$TPM[38639,],group="Brain")
df_lu=data.frame(TPM=assays(rse_lung)$TPM[38639,],group="Lung")
df_li=data.frame(TPM=assays(rse_liver)$TPM[38639,],group="Liver")
data_NSG1=rbind(df_b,df_lu,df_li)
#Statistical test
res_kruskal <- data_NSG1 %>% kruskal_test(TPM ~ group)
res_kruskal
## # A tibble: 1 Ă— 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 TPM 3837 2067. 2 0 Kruskal-Wallis
A p-value of 0 in the Kruskal-Wallis test indicates an extremely significant difference in gene expression distributions among the three tissues being compared. In practical terms, a p-value of 0 implies that there is no chance whatsoever that the observed differences are due to random chance.
I represent this result with a boxplot:
pwc2=data_NSG1 %>% wilcox_test(TPM ~ group, p.adjust.method = "BH")
pwc2
## # A tibble: 3 Ă— 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 TPM Brain Liver 2931 251 735031 2.80e-152 4.20e-152 ****
## 2 TPM Brain Lung 2931 655 1911432 0 0 ****
## 3 TPM Liver Lung 251 655 17892 2.33e- 74 2.33e- 74 ****
pwc = pwc2 %>% add_xy_position(x = "group")
ggboxplot(data_NSG1, x = "group", y = "TPM",outlier.shape = NA,width = 0.5,title="NSG1 expression across tissues", fill = "#0099FF") +
stat_pvalue_manual(pwc,y.position = c(400,400,400)) +
labs(subtitle = get_test_label(res_kruskal, detailed = TRUE),caption = get_pwc_label(pwc))
lll
which(rowData(rse_liver)$gene_name == "SERPINA6") #39648
## [1] 3648 19946
which(rowData(rse_brain)$gene_name == "ADH6") #9763
## [1] 39648
which(rowData(rse_lung)$gene_name == "ADH6") #9763
## [1] 39648
df_b=data.frame(TPM=assays(rse_brain)$TPM[19946,],group="Brain")
df_lu=data.frame(TPM=assays(rse_lung)$TPM[19946,],group="Lung")
df_li=data.frame(TPM=assays(rse_liver)$TPM[19946,],group="Liver")
data_PON1=rbind(df_b,df_lu,df_li)
#Statistical test
res_kruskal <- data_PON1 %>% kruskal_test(TPM ~ group)
res_kruskal
## # A tibble: 1 Ă— 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 TPM 3837 1500. 2 0 Kruskal-Wallis
pwc2=data_PON1 %>% wilcox_test(TPM ~ group, p.adjust.method = "BH")
pwc2
## # A tibble: 3 Ă— 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 TPM Brain Liver 2931 251 131 2.12e-242 6.36e-242 ****
## 2 TPM Brain Lung 2931 655 456308 9.25e-147 1.39e-146 ****
## 3 TPM Liver Lung 251 655 164160 6.91e-121 6.91e-121 ****
pwc = pwc2 %>% add_xy_position(x = "group")
ggboxplot(data_PON1, x = "group", y = "TPM",outlier.shape = NA,width = 0.5,title="SERPINA6 expression across tissues", fill = "#0099FF") +
stat_pvalue_manual(pwc2,y.position = c(400,400,400)) +
labs(subtitle = get_test_label(res_kruskal, detailed = TRUE),caption = get_pwc_label(pwc))
I split our DE genes between those “up”-regulated and “down”-regulated in our experiment, according to the log-fold change (positive or negative). Then I compare their overlap with all the GO terms, and evaluate the enrichment of each GO term with the corresponding (corrected) p-value.
The first step is to load the package and set Enrichr as as my target for the enrichment analysis
library('enrichR')
## Welcome to enrichR
## Checking connection ...
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is Live!
## WormEnrichr ... Connection is Live!
## YeastEnrichr ... Connection is Live!
## FishEnrichr ... Connection is Live!
## OxEnrichr ... Connection is Live!
setEnrichrSite("Enrichr")
## Connection changed to https://maayanlab.cloud/Enrichr/
## Connection is Live!
websiteLive <- TRUE
Then i can start considering the up regulated genes
dbs_ontologies <- c("GO_Biological_Process_2023", "GO_Molecular_Function_2023", "GO_Cellular_Component_2023")
if (websiteLive) {
enriched_ontologies <- enrichr(brain_total, dbs_ontologies)
}
## Uploading data to Enrichr... Done.
## Querying GO_Biological_Process_2023... Done.
## Querying GO_Molecular_Function_2023... Done.
## Querying GO_Cellular_Component_2023... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of GO Biological Process 2023 database", enriched_ontologies[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of GO Molecular Function 2023 database", enriched_ontologies[[2]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of GO Cellular Component 2023 database", enriched_ontologies[[3]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")
if (websiteLive) {
enriched_ontologies <- enrichr(liver_total, dbs_ontologies)
}
## Uploading data to Enrichr... Done.
## Querying GO_Biological_Process_2023... Done.
## Querying GO_Molecular_Function_2023... Done.
## Querying GO_Cellular_Component_2023... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of GO Biological Process 2023 database", enriched_ontologies[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of GO Molecular Function 2023 database", enriched_ontologies[[2]], showTerms = 5, numChar = 50, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of GO Cellular Component 2023 database", enriched_ontologies[[3]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
if (websiteLive) {
enriched_ontologies <- enrichr(lung_total, dbs_ontologies)
}
## Uploading data to Enrichr... Done.
## Querying GO_Biological_Process_2023... Done.
## Querying GO_Molecular_Function_2023... Done.
## Querying GO_Cellular_Component_2023... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of GO Biological Process 2023 database", enriched_ontologies[[1]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of GO Molecular Function 2023 database", enriched_ontologies[[2]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of GO Cellular Component 2023 database", enriched_ontologies[[3]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
The result for brain are quite clear, while for liver and lung is still not possible to identify the original tissue just looking to the GO term. So i decide to perform additional analysis
available_databases <- listEnrichrDbs()
print(available_databases)
## geneCoverage genesPerTerm
## 1 13362 275
## 2 27884 1284
## 3 6002 77
## 4 47172 1370
## 5 47107 509
## 6 21493 3713
## 7 1295 18
## 8 3185 73
## 9 2854 34
## 10 15057 300
## 11 4128 48
## 12 34061 641
## 13 7504 155
## 14 16399 247
## 15 12753 57
## 16 23726 127
## 17 32740 85
## 18 13373 258
## 19 19270 388
## 20 13236 82
## 21 14264 58
## 22 3096 31
## 23 22288 4368
## 24 4533 37
## 25 10231 158
## 26 2741 5
## 27 5655 342
## 28 10406 715
## 29 10493 200
## 30 11251 100
## 31 8695 100
## 32 1759 25
## 33 2178 89
## 34 851 15
## 35 10061 106
## 36 11250 166
## 37 15406 300
## 38 17711 300
## 39 17576 300
## 40 15797 176
## 41 12232 343
## 42 13572 301
## 43 6454 301
## 44 3723 47
## 45 7588 35
## 46 7682 78
## 47 7324 172
## 48 8469 122
## 49 13121 305
## 50 26382 1811
## 51 29065 2123
## 52 280 9
## 53 13877 304
## 54 15852 912
## 55 4320 129
## 56 4271 128
## 57 10496 201
## 58 1678 21
## 59 756 12
## 60 3800 48
## 61 2541 39
## 62 1918 39
## 63 5863 51
## 64 6768 47
## 65 25651 807
## 66 19129 1594
## 67 23939 293
## 68 23561 307
## 69 23877 302
## 70 15886 9
## 71 24350 299
## 72 3102 25
## 73 31132 298
## 74 30832 302
## 75 48230 1429
## 76 5613 36
## 77 9559 73
## 78 9448 63
## 79 16725 1443
## 80 19249 1443
## 81 15090 282
## 82 16129 292
## 83 15309 308
## 84 15103 318
## 85 15022 290
## 86 15676 310
## 87 15854 279
## 88 15015 321
## 89 3788 159
## 90 3357 153
## 91 12668 300
## 92 12638 300
## 93 8973 64
## 94 7010 87
## 95 5966 51
## 96 15562 887
## 97 17850 300
## 98 17660 300
## 99 1348 19
## 100 934 13
## 101 2541 39
## 102 2041 42
## 103 5209 300
## 104 49238 1550
## 105 2243 19
## 106 19586 545
## 107 22440 505
## 108 8184 24
## 109 18329 161
## 110 15755 28
## 111 10271 22
## 112 10427 38
## 113 10601 25
## 114 13822 21
## 115 8002 143
## 116 10089 45
## 117 13247 49
## 118 21809 2316
## 119 23601 2395
## 120 20883 299
## 121 19612 299
## 122 25983 299
## 123 19500 137
## 124 14893 128
## 125 17598 1208
## 126 5902 109
## 127 12486 299
## 128 1073 100
## 129 19513 117
## 130 14433 36
## 131 8655 61
## 132 11459 39
## 133 19741 270
## 134 27360 802
## 135 13072 26
## 136 13464 45
## 137 13787 200
## 138 13929 200
## 139 16964 200
## 140 17258 200
## 141 10352 58
## 142 10471 76
## 143 12419 491
## 144 19378 37
## 145 6201 45
## 146 4558 54
## 147 3264 22
## 148 7802 92
## 149 8551 98
## 150 12444 23
## 151 9000 20
## 152 7744 363
## 153 6204 387
## 154 13420 32
## 155 14148 122
## 156 9813 49
## 157 1397 13
## 158 9116 22
## 159 17464 63
## 160 394 73
## 161 11851 586
## 162 8189 421
## 163 18704 100
## 164 5605 39
## 165 5718 31
## 166 14156 40
## 167 16979 295
## 168 4383 146
## 169 54974 483
## 170 12118 448
## 171 12361 124
## 172 9763 139
## 173 8078 102
## 174 7173 43
## 175 5833 100
## 176 14937 33
## 177 11497 80
## 178 11936 34
## 179 9767 33
## 180 14167 80
## 181 17851 102
## 182 16853 360
## 183 6654 136
## 184 1683 10
## 185 20414 112
## 186 26076 250
## 187 26338 250
## 188 25381 250
## 189 25409 250
## 190 11980 250
## 191 31158 805
## 192 30006 815
## 193 13370 103
## 194 13697 343
## 195 2183 18
## 196 12765 13
## 197 1509 100
## 198 18365 1214
## 199 13525 175
## 200 9525 245
## 201 9440 245
## 202 3857 80
## 203 10489 61
## 204 1198 23
## 205 1882 47
## 206 1552 16
## 207 6713 68
## 208 936 15
## 209 8220 146
## 210 9021 793
## 211 8076 96
## 212 14698 33
## 213 10972 85
## 214 12126 38
## 215 13662 12
## 216 18290 34
## 217 12081 50
## 218 12853 485
## libraryName
## 1 Genome_Browser_PWMs
## 2 TRANSFAC_and_JASPAR_PWMs
## 3 Transcription_Factor_PPIs
## 4 ChEA_2013
## 5 Drug_Perturbations_from_GEO_2014
## 6 ENCODE_TF_ChIP-seq_2014
## 7 BioCarta_2013
## 8 Reactome_2013
## 9 WikiPathways_2013
## 10 Disease_Signatures_from_GEO_up_2014
## 11 KEGG_2013
## 12 TF-LOF_Expression_from_GEO
## 13 TargetScan_microRNA
## 14 PPI_Hub_Proteins
## 15 GO_Molecular_Function_2015
## 16 GeneSigDB
## 17 Chromosome_Location
## 18 Human_Gene_Atlas
## 19 Mouse_Gene_Atlas
## 20 GO_Cellular_Component_2015
## 21 GO_Biological_Process_2015
## 22 Human_Phenotype_Ontology
## 23 Epigenomics_Roadmap_HM_ChIP-seq
## 24 KEA_2013
## 25 NURSA_Human_Endogenous_Complexome
## 26 CORUM
## 27 SILAC_Phosphoproteomics
## 28 MGI_Mammalian_Phenotype_Level_3
## 29 MGI_Mammalian_Phenotype_Level_4
## 30 Old_CMAP_up
## 31 Old_CMAP_down
## 32 OMIM_Disease
## 33 OMIM_Expanded
## 34 VirusMINT
## 35 MSigDB_Computational
## 36 MSigDB_Oncogenic_Signatures
## 37 Disease_Signatures_from_GEO_down_2014
## 38 Virus_Perturbations_from_GEO_up
## 39 Virus_Perturbations_from_GEO_down
## 40 Cancer_Cell_Line_Encyclopedia
## 41 NCI-60_Cancer_Cell_Lines
## 42 Tissue_Protein_Expression_from_ProteomicsDB
## 43 Tissue_Protein_Expression_from_Human_Proteome_Map
## 44 HMDB_Metabolites
## 45 Pfam_InterPro_Domains
## 46 GO_Biological_Process_2013
## 47 GO_Cellular_Component_2013
## 48 GO_Molecular_Function_2013
## 49 Allen_Brain_Atlas_up
## 50 ENCODE_TF_ChIP-seq_2015
## 51 ENCODE_Histone_Modifications_2015
## 52 Phosphatase_Substrates_from_DEPOD
## 53 Allen_Brain_Atlas_down
## 54 ENCODE_Histone_Modifications_2013
## 55 Achilles_fitness_increase
## 56 Achilles_fitness_decrease
## 57 MGI_Mammalian_Phenotype_2013
## 58 BioCarta_2015
## 59 HumanCyc_2015
## 60 KEGG_2015
## 61 NCI-Nature_2015
## 62 Panther_2015
## 63 WikiPathways_2015
## 64 Reactome_2015
## 65 ESCAPE
## 66 HomoloGene
## 67 Disease_Perturbations_from_GEO_down
## 68 Disease_Perturbations_from_GEO_up
## 69 Drug_Perturbations_from_GEO_down
## 70 Genes_Associated_with_NIH_Grants
## 71 Drug_Perturbations_from_GEO_up
## 72 KEA_2015
## 73 Gene_Perturbations_from_GEO_up
## 74 Gene_Perturbations_from_GEO_down
## 75 ChEA_2015
## 76 dbGaP
## 77 LINCS_L1000_Chem_Pert_up
## 78 LINCS_L1000_Chem_Pert_down
## 79 GTEx_Tissue_Expression_Down
## 80 GTEx_Tissue_Expression_Up
## 81 Ligand_Perturbations_from_GEO_down
## 82 Aging_Perturbations_from_GEO_down
## 83 Aging_Perturbations_from_GEO_up
## 84 Ligand_Perturbations_from_GEO_up
## 85 MCF7_Perturbations_from_GEO_down
## 86 MCF7_Perturbations_from_GEO_up
## 87 Microbe_Perturbations_from_GEO_down
## 88 Microbe_Perturbations_from_GEO_up
## 89 LINCS_L1000_Ligand_Perturbations_down
## 90 LINCS_L1000_Ligand_Perturbations_up
## 91 L1000_Kinase_and_GPCR_Perturbations_down
## 92 L1000_Kinase_and_GPCR_Perturbations_up
## 93 Reactome_2016
## 94 KEGG_2016
## 95 WikiPathways_2016
## 96 ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X
## 97 Kinase_Perturbations_from_GEO_down
## 98 Kinase_Perturbations_from_GEO_up
## 99 BioCarta_2016
## 100 HumanCyc_2016
## 101 NCI-Nature_2016
## 102 Panther_2016
## 103 DrugMatrix
## 104 ChEA_2016
## 105 huMAP
## 106 Jensen_TISSUES
## 107 RNA-Seq_Disease_Gene_and_Drug_Signatures_from_GEO
## 108 MGI_Mammalian_Phenotype_2017
## 109 Jensen_COMPARTMENTS
## 110 Jensen_DISEASES
## 111 BioPlex_2017
## 112 GO_Cellular_Component_2017
## 113 GO_Molecular_Function_2017
## 114 GO_Biological_Process_2017
## 115 GO_Cellular_Component_2017b
## 116 GO_Molecular_Function_2017b
## 117 GO_Biological_Process_2017b
## 118 ARCHS4_Tissues
## 119 ARCHS4_Cell-lines
## 120 ARCHS4_IDG_Coexp
## 121 ARCHS4_Kinases_Coexp
## 122 ARCHS4_TFs_Coexp
## 123 SysMyo_Muscle_Gene_Sets
## 124 miRTarBase_2017
## 125 TargetScan_microRNA_2017
## 126 Enrichr_Libraries_Most_Popular_Genes
## 127 Enrichr_Submissions_TF-Gene_Coocurrence
## 128 Data_Acquisition_Method_Most_Popular_Genes
## 129 DSigDB
## 130 GO_Biological_Process_2018
## 131 GO_Cellular_Component_2018
## 132 GO_Molecular_Function_2018
## 133 TF_Perturbations_Followed_by_Expression
## 134 Chromosome_Location_hg19
## 135 NIH_Funded_PIs_2017_Human_GeneRIF
## 136 NIH_Funded_PIs_2017_Human_AutoRIF
## 137 Rare_Diseases_AutoRIF_ARCHS4_Predictions
## 138 Rare_Diseases_GeneRIF_ARCHS4_Predictions
## 139 NIH_Funded_PIs_2017_AutoRIF_ARCHS4_Predictions
## 140 NIH_Funded_PIs_2017_GeneRIF_ARCHS4_Predictions
## 141 Rare_Diseases_GeneRIF_Gene_Lists
## 142 Rare_Diseases_AutoRIF_Gene_Lists
## 143 SubCell_BarCode
## 144 GWAS_Catalog_2019
## 145 WikiPathways_2019_Human
## 146 WikiPathways_2019_Mouse
## 147 TRRUST_Transcription_Factors_2019
## 148 KEGG_2019_Human
## 149 KEGG_2019_Mouse
## 150 InterPro_Domains_2019
## 151 Pfam_Domains_2019
## 152 DepMap_WG_CRISPR_Screens_Broad_CellLines_2019
## 153 DepMap_WG_CRISPR_Screens_Sanger_CellLines_2019
## 154 MGI_Mammalian_Phenotype_Level_4_2019
## 155 UK_Biobank_GWAS_v1
## 156 BioPlanet_2019
## 157 ClinVar_2019
## 158 PheWeb_2019
## 159 DisGeNET
## 160 HMS_LINCS_KinomeScan
## 161 CCLE_Proteomics_2020
## 162 ProteomicsDB_2020
## 163 lncHUB_lncRNA_Co-Expression
## 164 Virus-Host_PPI_P-HIPSTer_2020
## 165 Elsevier_Pathway_Collection
## 166 Table_Mining_of_CRISPR_Studies
## 167 COVID-19_Related_Gene_Sets
## 168 MSigDB_Hallmark_2020
## 169 Enrichr_Users_Contributed_Lists_2020
## 170 TG_GATES_2020
## 171 Allen_Brain_Atlas_10x_scRNA_2021
## 172 Descartes_Cell_Types_and_Tissue_2021
## 173 KEGG_2021_Human
## 174 WikiPathway_2021_Human
## 175 HuBMAP_ASCT_plus_B_augmented_w_RNAseq_Coexpression
## 176 GO_Biological_Process_2021
## 177 GO_Cellular_Component_2021
## 178 GO_Molecular_Function_2021
## 179 MGI_Mammalian_Phenotype_Level_4_2021
## 180 CellMarker_Augmented_2021
## 181 Orphanet_Augmented_2021
## 182 COVID-19_Related_Gene_Sets_2021
## 183 PanglaoDB_Augmented_2021
## 184 Azimuth_Cell_Types_2021
## 185 PhenGenI_Association_2021
## 186 RNAseq_Automatic_GEO_Signatures_Human_Down
## 187 RNAseq_Automatic_GEO_Signatures_Human_Up
## 188 RNAseq_Automatic_GEO_Signatures_Mouse_Down
## 189 RNAseq_Automatic_GEO_Signatures_Mouse_Up
## 190 GTEx_Aging_Signatures_2021
## 191 HDSigDB_Human_2021
## 192 HDSigDB_Mouse_2021
## 193 HuBMAP_ASCTplusB_augmented_2022
## 194 FANTOM6_lncRNA_KD_DEGs
## 195 MAGMA_Drugs_and_Diseases
## 196 PFOCR_Pathways
## 197 Tabula_Sapiens
## 198 ChEA_2022
## 199 Diabetes_Perturbations_GEO_2022
## 200 LINCS_L1000_Chem_Pert_Consensus_Sigs
## 201 LINCS_L1000_CRISPR_KO_Consensus_Sigs
## 202 Tabula_Muris
## 203 Reactome_2022
## 204 SynGO_2022
## 205 GlyGen_Glycosylated_Proteins_2022
## 206 IDG_Drug_Targets_2022
## 207 KOMP2_Mouse_Phenotypes_2022
## 208 Metabolomics_Workbench_Metabolites_2022
## 209 Proteomics_Drug_Atlas_2023
## 210 The_Kinase_Library_2023
## 211 GTEx_Tissues_V8_2023
## 212 GO_Biological_Process_2023
## 213 GO_Cellular_Component_2023
## 214 GO_Molecular_Function_2023
## 215 PFOCR_Pathways_2023
## 216 GWAS_Catalog_2023
## 217 GeDiPNet_2023
## 218 MAGNET_2023
## link
## 1 http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
## 2 http://jaspar.genereg.net/html/DOWNLOAD/
## 3
## 4 http://amp.pharm.mssm.edu/lib/cheadownload.jsp
## 5 http://www.ncbi.nlm.nih.gov/geo/
## 6 http://genome.ucsc.edu/ENCODE/downloads.html
## 7 https://cgap.nci.nih.gov/Pathways/BioCarta_Pathways
## 8 http://www.reactome.org/download/index.html
## 9 http://www.wikipathways.org/index.php/Download_Pathways
## 10 http://www.ncbi.nlm.nih.gov/geo/
## 11 http://www.kegg.jp/kegg/download/
## 12 http://www.ncbi.nlm.nih.gov/geo/
## 13 http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_61
## 14 http://amp.pharm.mssm.edu/X2K
## 15 http://www.geneontology.org/GO.downloads.annotations.shtml
## 16 https://pubmed.ncbi.nlm.nih.gov/22110038/
## 17 http://software.broadinstitute.org/gsea/msigdb/index.jsp
## 18 http://biogps.org/downloads/
## 19 http://biogps.org/downloads/
## 20 http://www.geneontology.org/GO.downloads.annotations.shtml
## 21 http://www.geneontology.org/GO.downloads.annotations.shtml
## 22 http://www.human-phenotype-ontology.org/
## 23 http://www.roadmapepigenomics.org/
## 24 http://amp.pharm.mssm.edu/lib/keacommandline.jsp
## 25 https://www.nursa.org/nursa/index.jsf
## 26 http://mips.helmholtz-muenchen.de/genre/proj/corum/
## 27 http://amp.pharm.mssm.edu/lib/keacommandline.jsp
## 28 http://www.informatics.jax.org/
## 29 http://www.informatics.jax.org/
## 30 http://www.broadinstitute.org/cmap/
## 31 http://www.broadinstitute.org/cmap/
## 32 http://www.omim.org/downloads
## 33 http://www.omim.org/downloads
## 34 http://mint.bio.uniroma2.it/download.html
## 35 http://www.broadinstitute.org/gsea/msigdb/collections.jsp
## 36 http://www.broadinstitute.org/gsea/msigdb/collections.jsp
## 37 http://www.ncbi.nlm.nih.gov/geo/
## 38 http://www.ncbi.nlm.nih.gov/geo/
## 39 http://www.ncbi.nlm.nih.gov/geo/
## 40 https://portals.broadinstitute.org/ccle/home\n
## 41 http://biogps.org/downloads/
## 42 https://www.proteomicsdb.org/
## 43 http://www.humanproteomemap.org/index.php
## 44 http://www.hmdb.ca/downloads
## 45 ftp://ftp.ebi.ac.uk/pub/databases/interpro/
## 46 http://www.geneontology.org/GO.downloads.annotations.shtml
## 47 http://www.geneontology.org/GO.downloads.annotations.shtml
## 48 http://www.geneontology.org/GO.downloads.annotations.shtml
## 49 http://www.brain-map.org/
## 50 http://genome.ucsc.edu/ENCODE/downloads.html
## 51 http://genome.ucsc.edu/ENCODE/downloads.html
## 52 http://www.koehn.embl.de/depod/
## 53 http://www.brain-map.org/
## 54 http://genome.ucsc.edu/ENCODE/downloads.html
## 55 http://www.broadinstitute.org/achilles
## 56 http://www.broadinstitute.org/achilles
## 57 http://www.informatics.jax.org/
## 58 https://cgap.nci.nih.gov/Pathways/BioCarta_Pathways
## 59 http://humancyc.org/
## 60 http://www.kegg.jp/kegg/download/
## 61 http://pid.nci.nih.gov/
## 62 http://www.pantherdb.org/
## 63 http://www.wikipathways.org/index.php/Download_Pathways
## 64 http://www.reactome.org/download/index.html
## 65 http://www.maayanlab.net/ESCAPE/
## 66 http://www.ncbi.nlm.nih.gov/homologene
## 67 http://www.ncbi.nlm.nih.gov/geo/
## 68 http://www.ncbi.nlm.nih.gov/geo/
## 69 http://www.ncbi.nlm.nih.gov/geo/
## 70 https://grants.nih.gov/grants/oer.htm\n
## 71 http://www.ncbi.nlm.nih.gov/geo/
## 72 http://amp.pharm.mssm.edu/Enrichr
## 73 http://www.ncbi.nlm.nih.gov/geo/
## 74 http://www.ncbi.nlm.nih.gov/geo/
## 75 http://amp.pharm.mssm.edu/Enrichr
## 76 http://www.ncbi.nlm.nih.gov/gap
## 77 https://clue.io/
## 78 https://clue.io/
## 79 http://www.gtexportal.org/
## 80 http://www.gtexportal.org/
## 81 http://www.ncbi.nlm.nih.gov/geo/
## 82 http://www.ncbi.nlm.nih.gov/geo/
## 83 http://www.ncbi.nlm.nih.gov/geo/
## 84 http://www.ncbi.nlm.nih.gov/geo/
## 85 http://www.ncbi.nlm.nih.gov/geo/
## 86 http://www.ncbi.nlm.nih.gov/geo/
## 87 http://www.ncbi.nlm.nih.gov/geo/
## 88 http://www.ncbi.nlm.nih.gov/geo/
## 89 https://clue.io/
## 90 https://clue.io/
## 91 https://clue.io/
## 92 https://clue.io/
## 93 http://www.reactome.org/download/index.html
## 94 http://www.kegg.jp/kegg/download/
## 95 http://www.wikipathways.org/index.php/Download_Pathways
## 96
## 97 http://www.ncbi.nlm.nih.gov/geo/
## 98 http://www.ncbi.nlm.nih.gov/geo/
## 99 http://cgap.nci.nih.gov/Pathways/BioCarta_Pathways
## 100 http://humancyc.org/
## 101 http://pid.nci.nih.gov/
## 102 http://www.pantherdb.org/pathway/
## 103 https://ntp.niehs.nih.gov/drugmatrix/
## 104 http://amp.pharm.mssm.edu/Enrichr
## 105 http://proteincomplexes.org/
## 106 http://tissues.jensenlab.org/
## 107 http://www.ncbi.nlm.nih.gov/geo/
## 108 http://www.informatics.jax.org/
## 109 http://compartments.jensenlab.org/
## 110 http://diseases.jensenlab.org/
## 111 http://bioplex.hms.harvard.edu/
## 112 http://www.geneontology.org/
## 113 http://www.geneontology.org/
## 114 http://www.geneontology.org/
## 115 http://www.geneontology.org/
## 116 http://www.geneontology.org/
## 117 http://www.geneontology.org/
## 118 http://amp.pharm.mssm.edu/archs4
## 119 http://amp.pharm.mssm.edu/archs4
## 120 http://amp.pharm.mssm.edu/archs4
## 121 http://amp.pharm.mssm.edu/archs4
## 122 http://amp.pharm.mssm.edu/archs4
## 123 http://sys-myo.rhcloud.com/
## 124 http://mirtarbase.mbc.nctu.edu.tw/
## 125 http://www.targetscan.org/
## 126 http://amp.pharm.mssm.edu/Enrichr
## 127 http://amp.pharm.mssm.edu/Enrichr
## 128 http://amp.pharm.mssm.edu/Enrichr
## 129 http://tanlab.ucdenver.edu/DSigDB/DSigDBv1.0/
## 130 http://www.geneontology.org/
## 131 http://www.geneontology.org/
## 132 http://www.geneontology.org/
## 133 http://www.ncbi.nlm.nih.gov/geo/
## 134 http://hgdownload.cse.ucsc.edu/downloads.html
## 135 https://www.ncbi.nlm.nih.gov/pubmed/
## 136 https://www.ncbi.nlm.nih.gov/pubmed/
## 137 https://amp.pharm.mssm.edu/geneshot/
## 138 https://www.ncbi.nlm.nih.gov/gene/about-generif
## 139 https://www.ncbi.nlm.nih.gov/pubmed/
## 140 https://www.ncbi.nlm.nih.gov/pubmed/
## 141 https://www.ncbi.nlm.nih.gov/gene/about-generif
## 142 https://amp.pharm.mssm.edu/geneshot/
## 143 http://www.subcellbarcode.org/
## 144 https://www.ebi.ac.uk/gwas
## 145 https://www.wikipathways.org/
## 146 https://www.wikipathways.org/
## 147 https://www.grnpedia.org/trrust/
## 148 https://www.kegg.jp/
## 149 https://www.kegg.jp/
## 150 https://www.ebi.ac.uk/interpro/
## 151 https://pfam.xfam.org/
## 152 https://depmap.org/
## 153 https://depmap.org/
## 154 http://www.informatics.jax.org/
## 155 https://www.ukbiobank.ac.uk/tag/gwas/
## 156 https://tripod.nih.gov/bioplanet/
## 157 https://www.ncbi.nlm.nih.gov/clinvar/
## 158 http://pheweb.sph.umich.edu/
## 159 https://www.disgenet.org
## 160 http://lincs.hms.harvard.edu/kinomescan/
## 161 https://portals.broadinstitute.org/ccle
## 162 https://www.proteomicsdb.org/
## 163 https://amp.pharm.mssm.edu/lnchub/
## 164 http://phipster.org/
## 165 http://www.transgene.ru/disease-pathways/
## 166
## 167 https://amp.pharm.mssm.edu/covid19
## 168 https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp
## 169 https://maayanlab.cloud/Enrichr
## 170 https://toxico.nibiohn.go.jp/english/
## 171 https://portal.brain-map.org/
## 172 https://descartes.brotmanbaty.org/bbi/human-gene-expression-during-development/
## 173 https://www.kegg.jp/
## 174 https://www.wikipathways.org/
## 175 https://hubmapconsortium.github.io/ccf-asct-reporter/
## 176 http://www.geneontology.org/
## 177 http://www.geneontology.org/
## 178 http://www.geneontology.org/
## 179 http://www.informatics.jax.org/
## 180 http://biocc.hrbmu.edu.cn/CellMarker/
## 181 http://www.orphadata.org/
## 182 https://maayanlab.cloud/covid19/
## 183 https://panglaodb.se/
## 184 https://azimuth.hubmapconsortium.org/
## 185 https://www.ncbi.nlm.nih.gov/gap/phegeni
## 186 https://maayanlab.cloud/archs4/
## 187 https://maayanlab.cloud/archs4/
## 188 https://maayanlab.cloud/archs4/
## 189 https://maayanlab.cloud/archs4/
## 190 https://gtexportal.org/
## 191 https://www.hdinhd.org/
## 192 https://www.hdinhd.org/
## 193 https://hubmapconsortium.github.io/ccf-asct-reporter/
## 194 https://fantom.gsc.riken.jp/6/
## 195 https://github.com/nybell/drugsets/tree/main/DATA/GENESETS
## 196 https://pfocr.wikipathways.org/
## 197 https://tabula-sapiens-portal.ds.czbiohub.org/
## 198 https://maayanlab.cloud/chea3/
## 199 https://appyters.maayanlab.cloud/#/Gene_Expression_T2D_Signatures
## 200 https://maayanlab.cloud/sigcom-lincs/#/Download
## 201 https://maayanlab.cloud/sigcom-lincs/#/Download
## 202 https://tabula-muris.ds.czbiohub.org/
## 203 https://reactome.org/download-data
## 204 https://www.syngoportal.org/
## 205 https://www.glygen.org/
## 206 https://drugcentral.org/
## 207 https://www.mousephenotype.org/
## 208 https://www.metabolomicsworkbench.org/
## 209 https://www.nature.com/articles/s41587-022-01539-0
## 210 https://kinase-library.phosphosite.org/site
## 211 https://gtexportal.org/home/
## 212 http://www.geneontology.org/
## 213 http://www.geneontology.org/
## 214 http://www.geneontology.org/
## 215 https://pfocr.wikipathways.org/
## 216 https://www.ebi.ac.uk/gwas
## 217 http://gedipnet.bicnirrh.res.in/
## 218 https://magnet-winterlab.herokuapp.com/
## numTerms appyter categoryId
## 1 615 ea115789fcbf12797fd692cec6df0ab4dbc79c6a 1
## 2 326 7d42eb43a64a4e3b20d721fc7148f685b53b6b30 1
## 3 290 849f222220618e2599d925b6b51868cf1dab3763 1
## 4 353 7ebe772afb55b63b41b79dd8d06ea0fdd9fa2630 7
## 5 701 ad270a6876534b7cb063e004289dcd4d3164f342 7
## 6 498 497787ebc418d308045efb63b8586f10c526af51 7
## 7 249 4a293326037a5229aedb1ad7b2867283573d8bcd 7
## 8 78 b343994a1b68483b0122b08650201c9b313d5c66 7
## 9 199 5c307674c8b97e098f8399c92f451c0ff21cbf68 7
## 10 142 248c4ed8ea28352795190214713c86a39fd7afab 7
## 11 200 eb26f55d3904cb0ea471998b6a932a9bf65d8e50 7
## 12 269 1
## 13 222 f4029bf6a62c91ab29401348e51df23b8c44c90f 7
## 14 385 69c0cfe07d86f230a7ef01b365abcc7f6e52f138 2
## 15 1136 f531ac2b6acdf7587a54b79b465a5f4aab8f00f9 7
## 16 2139 6d655e0aa3408a7accb3311fbda9b108681a8486 4
## 17 386 8dab0f96078977223646ff63eb6187e0813f1433 7
## 18 84 0741451470203d7c40a06274442f25f74b345c9c 5
## 19 96 31191bfadded5f96983f93b2a113cf2110ff5ddb 5
## 20 641 e1d004d5797cbd2363ef54b1c3b361adb68795c6 7
## 21 5192 bf120b6e11242b1a64c80910d8e89f87e618e235 7
## 22 1779 17a138b0b70aa0e143fe63c14f82afb70bc3ed0a 3
## 23 383 e1bc8a398e9b21f9675fb11bef18087eda21b1bf 1
## 24 474 462045609440fa1e628a75716b81a1baa5bd9145 7
## 25 1796 7d3566b12ebc23dd23d9ca9bb97650f826377b16 2
## 26 1658 d047f6ead7831b00566d5da7a3b027ed9196e104 2
## 27 84 54dcd9438b33301deb219866e162b0f9da7e63a0 2
## 28 71 c3bfc90796cfca8f60cba830642a728e23a53565 7
## 29 476 0b09a9a1aa0af4fc7ea22d34a9ae644d45864bd6 7
## 30 6100 9041f90cccbc18479138330228b336265e09021c 7
## 31 6100 ebc0d905b3b3142f936d400c5f2a4ff926c81c37 7
## 32 90 cb2b92578a91e023d0498a334923ee84add34eca 4
## 33 187 27eca242904d8e12a38cf8881395bc50d57a03e1 4
## 34 85 5abad1fc36216222b0420cadcd9be805a0dda63e 4
## 35 858 e4cdcc7e259788fdf9b25586cce3403255637064 4
## 36 189 c76f5319c33c4833c71db86a30d7e33cd63ff8cf 4
## 37 142 aabdf7017ae55ae75a004270924bcd336653b986 7
## 38 323 45268b7fc680d05dd9a29743c2f2b2840a7620bf 4
## 39 323 5f531580ccd168ee4acc18b02c6bdf8200e19d08 4
## 40 967 eb38dbc3fb20adafa9d6f9f0b0e36f378e75284f 5
## 41 93 75c81676d8d6d99d262c9660edc024b78cfb07c9 5
## 42 207 7
## 43 30 49351dc989f9e6ca97c55f8aca7778aa3bfb84b9 5
## 44 3906 1905132115d22e4119bce543bdacaab074edb363 6
## 45 311 e2b4912cfb799b70d87977808c54501544e4cdc9 6
## 46 941 5216d1ade194ffa5a6c00f105e2b1899f64f45fe 7
## 47 205 fd1332a42395e0bc1dba82868b39be7983a48cc5 7
## 48 402 7e3e99e5aae02437f80b0697b197113ce3209ab0 7
## 49 2192 3804715a63a308570e47aa1a7877f01147ca6202 5
## 50 816 56b6adb4dc8a2f540357ef992d6cd93dfa2907e5 1
## 51 412 55b56cd8cf2ff04b26a09b9f92904008b82f3a6f 1
## 52 59 d40701e21092b999f4720d1d2b644dd0257b6259 2
## 53 2192 ea67371adec290599ddf484ced2658cfae259304 5
## 54 109 c209ae527bc8e98e4ccd27a668d36cd2c80b35b4 7
## 55 216 98366496a75f163164106e72439fb2bf2f77de4e 4
## 56 216 83a710c1ff67fd6b8af0d80fa6148c40dbd9bc64 4
## 57 476 a4c6e217a81a4a58ff5a1c9fc102b70beab298e9 7
## 58 239 70e4eb538daa7688691acfe5d9c3c19022be832b 7
## 59 125 711f0c02b23f5e02a01207174943cfeee9d3ea9c 7
## 60 179 e80d25c56de53c704791ddfdc6ab5eec28ae7243 7
## 61 209 47edfc012bcbb368a10b717d8dca103f7814b5a4 7
## 62 104 ab824aeeff0712bab61f372e43aebb870d1677a9 7
## 63 404 1f7eea2f339f37856522c1f1c70ec74c7b25325f 7
## 64 1389 36e541bee015eddb8d53827579549e30fe7a3286 7
## 65 315 a7acc741440264717ff77751a7e5fed723307835 5
## 66 12 663b665b75a804ef98add689f838b68e612f0d2a 6
## 67 839 0f412e0802d76efa0374504c2c9f5e0624ff7f09 8
## 68 839 9ddc3902fb01fb9eaf1a2a7c2ff3acacbb48d37e 8
## 69 906 068623a05ecef3e4a5e0b4f8db64bb8faa3c897f 8
## 70 32876 76fc5ec6735130e287e62bae6770a3c5ee068645 6
## 71 906 c9c2155b5ac81ac496854fa61ba566dcae06cc80 8
## 72 428 18a081774e6e0aaf60b1a4be7fd20afcf9e08399 2
## 73 2460 53dedc29ce3100930d68e506f941ef59de05dc6b 8
## 74 2460 499882af09c62dd6da545c15cb51c1dc5e234f78 8
## 75 395 712eb7b6edab04658df153605ec6079fa89fb5c7 7
## 76 345 010f1267055b1a1cb036e560ea525911c007a666 4
## 77 33132 5e678b3debe8d8ea95187d0cd35c914017af5eb3 7
## 78 33132 fedbf5e221f45ee60ebd944f92569b5eda7f2330 7
## 79 2918 74b818bd299a9c42c1750ffe43616aa9f7929f02 5
## 80 2918 103738763d89cae894bec9f145ac28167a90e611 5
## 81 261 1eb3c0426140340527155fd0ef67029db2a72191 8
## 82 286 cd95fe1b505ba6f28cd722cfba50fdea979d3b4c 8
## 83 286 74c4f0a0447777005b2a5c00c9882a56dfc62d7c 8
## 84 261 31baa39da2931ddd5f7aedf2d0bbba77d2ba7b46 8
## 85 401 555f68aef0a29a67b614a0d7e20b6303df9069c6 8
## 86 401 1bc2ba607f1ff0dda44e2a15f32a2c04767da18c 8
## 87 312 9e613dba78ef7e60676b13493a9dc49ccd3c8b3f 8
## 88 312 d0c3e2a68e8c611c669098df2c87b530cec3e132 8
## 89 96 957846cb05ef31fc8514120516b73cc65af7980e 7
## 90 96 3bd494146c98d8189898a947f5ef5710f1b7c4b2 7
## 91 3644 1ccc5bce553e0c2279f8e3f4ddcfbabcf566623b 7
## 92 3644 b54a0d4ba525eac4055c7314ca9d9312adcb220c 7
## 93 1530 1f54638e8f45075fb79489f0e0ef906594cb0678 7
## 94 293 43f56da7540195ba3c94eb6e34c522a699b36da9 7
## 95 437 340be98b444cad50bb974df69018fd598e23e5e1 7
## 96 104 5426f7747965c23ef32cff46fabf906e2cd76bfa 1
## 97 285 bb9682d78b8fc43be842455e076166fcd02cefc3 2
## 98 285 78618915009cac3a0663d6f99d359e39a31b6660 2
## 99 237 13d9ab18921d5314a5b2b366f6142b78ab0ff6aa 2
## 100 152 d6a502ef9b4c789ed5e73ca5a8de372796e5c72a 2
## 101 209 3c1e1f7d1a651d9aaa198e73704030716fc09431 2
## 102 112 ca5f6abf7f75d9baae03396e84d07300bf1fd051 2
## 103 7876 255c3db820d612f34310f22a6985dad50e9fe1fe 4
## 104 645 af271913344aa08e6a755af1d433ef15768d749a 7
## 105 995 249247d2f686d3eb4b9e4eb976c51159fac80a89 2
## 106 1842 e8879ab9534794721614d78fe2883e9e564d7759 3
## 107 1302 f0752e4d7f5198f86446678966b260c530d19d78 8
## 108 5231 0705e59bff98deda6e9cbe00cfcdd871c85e7d04 7
## 109 2283 56ec68c32d4e83edc2ee83bea0e9f6a3829b2279 3
## 110 1811 3045dff8181367c1421627bb8e4c5a32c6d67f98 3
## 111 3915 b8620b1a9d0d271d1a2747d8cfc63589dba39991 2
## 112 636 8fed21d22dfcc3015c05b31d942fdfc851cc8e04 7
## 113 972 b4018906e0a8b4e81a1b1afc51e0a2e7655403eb 7
## 114 3166 d9da4dba4a3eb84d4a28a3835c06dfbbe5811f92 7
## 115 816 ecf39c41fa5bc7deb625a2b5761a708676e9db7c 7
## 116 3271 8d8340361dd36a458f1f0a401f1a3141de1f3200 7
## 117 10125 6404c38bffc2b3732de4e3fbe417b5043009fe34 7
## 118 108 4126374338235650ab158ba2c61cd2e2383b70df 5
## 119 125 5496ef9c9ae9429184d0b9485c23ba468ee522a8 5
## 120 352 ce60be284fdd5a9fc6240a355421a9e12b1ee84a 4
## 121 498 6721c5ed97b7772e4a19fdc3f797110df0164b75 2
## 122 1724 8a468c3ae29fa68724f744cbef018f4f3b61c5ab 1
## 123 1135 8
## 124 3240 6b7c7fe2a97b19aecbfba12d8644af6875ad99c4 1
## 125 683 79d13fb03d2fa6403f9be45c90eeda0f6822e269 1
## 126 121 e9b7d8ee237d0a690bd79d970a23a9fa849901ed 6
## 127 1722 be2ca8ef5a8c8e17d7e7bd290e7cbfe0951396c0 1
## 128 12 17ce5192b9eba7d109b6d228772ea8ab222e01ef 6
## 129 4026 287476538ab98337dbe727b3985a436feb6d192a 4
## 130 5103 b5b77681c46ac58cd050e60bcd4ad5041a9ab0a9 7
## 131 446 e9ebe46188efacbe1056d82987ff1c70218fa7ae 7
## 132 1151 79ff80ae9a69dd00796e52569e41422466fa0bee 7
## 133 1958 34d08a4878c19584aaf180377f2ea96faa6a6eb1 1
## 134 36 fdab39c467ba6b0fb0288df1176d7dfddd7196d5 6
## 135 5687 859b100fac3ca774ad84450b1fbb65a78fcc6b12 6
## 136 12558 fc5bf033b932cf173633e783fc8c6228114211f8 6
## 137 3725 375ff8cdd64275a916fa24707a67968a910329bb 4
## 138 2244 0f7fb7f347534779ecc6c87498e96b5460a8d652 4
## 139 12558 f77de51aaf0979dd6f56381cf67ba399b4640d28 6
## 140 5684 25fa899b715cd6a9137f6656499f89cd25144029 6
## 141 2244 0fb9ac92dbe52024661c088f71a1134f00567a8b 4
## 142 3725 ee3adbac2da389959410260b280e7df1fd3730df 4
## 143 104 b50bb9480d8a77103fb75b331fd9dd927246939a 2
## 144 1737 fef3864bcb5dd9e60cee27357eff30226116c49b 7
## 145 472 b0c9e9ebb9014f14561e896008087725a2db24b7 7
## 146 176 e7750958da20f585c8b6d5bc4451a5a4305514ba 7
## 147 571 5f8cf93e193d2bcefa5a37ccdf0eefac576861b0 1
## 148 308 3477bc578c4ea5d851dcb934fe2a41e9fd789bb4 7
## 149 303 187eb44b2d6fa154ebf628eba1f18537f64e797c 7
## 150 1071 18dd5ec520fdf589a93d6a7911289c205e1ddf22 6
## 151 608 a6325ed264f9ac9e6518796076c46a1d885cca7a 6
## 152 558 0b08b32b20854ac8a738458728a9ea50c2e04800 4
## 153 325 b7c4ead26d0eb64f1697c030d31682b581c8bb56 4
## 154 5261 f1bed632e89ebc054da44236c4815cdce03ef5ee 7
## 155 857 958fb52e6215626673a5acf6e9289a1b84d11b4a 4
## 156 1510 e110851dfc763d30946f2abedcc2cd571ac357a0 2
## 157 182 0a95303f8059bec08836ecfe02ce3da951150547 4
## 158 1161 6a7c7321b6b72c5285b722f7902d26a2611117cb 4
## 159 9828 3c261626478ce9e6bf2c7f0a8014c5e901d43dc0 4
## 160 148 47ba06cdc92469ac79400fc57acd84ba343ba616 2
## 161 378 7094b097ae2301a1d6a5bd856a193b084cca993d 5
## 162 913 8c87c8346167bac2ba68195a32458aba9b1acfd1 5
## 163 3729 45b597d7efa5693b7e4172b09c0ed2dda3305582 1
## 164 6715 a592eed13e8e9496aedbab63003b965574e46a65 2
## 165 1721 9196c760e3bcae9c9de1e3f87ad81f96bde24325 2
## 166 802 ad580f3864fa8ff69eaca11f6d2e7f9b86378d08 6
## 167 205 72b0346849570f66a77a6856722601e711596cb4 7
## 168 50 6952efda94663d4bd8db09bf6eeb4e67d21ef58c 2
## 169 1482 8dc362703b38b30ac3b68b6401a9b20a58e7d3ef 6
## 170 1190 9e32560437b11b4628b00ccf3d584360f7f7daee 4
## 171 766 46f8235cb585829331799a71aec3f7c082170219 5
## 172 172 5
## 173 320 2
## 174 622 2
## 175 344 5
## 176 6036 7
## 177 511 7
## 178 1274 7
## 179 4601 3
## 180 1097 5
## 181 3774 4
## 182 478 4
## 183 178 5
## 184 341 5
## 185 950 4
## 186 4269 8
## 187 4269 8
## 188 4216 8
## 189 4216 8
## 190 270 4
## 191 2564 4
## 192 2579 4
## 193 777 5
## 194 350 1
## 195 1395 4
## 196 17326 7
## 197 469 5
## 198 757 1
## 199 601 4
## 200 10850 4
## 201 10424 4
## 202 106 5
## 203 1818 2
## 204 118 3
## 205 338 2
## 206 888 4
## 207 529 3
## 208 233 2
## 209 1748 4
## 210 303 2
## 211 511 5
## 212 5407 3
## 213 474 3
## 214 1147 3
## 215 21845 2
## 216 5271 4
## 217 2388 4
## 218 72 5
dbs_pathway <- c("BioPlanet_2019", "WikiPathway_2021_Human", "KEGG_2021_Human")
if (websiteLive) {
enriched_pathway_brain <- enrichr(brain_total, dbs_pathway)
}
## Uploading data to Enrichr... Done.
## Querying BioPlanet_2019... Done.
## Querying WikiPathway_2021_Human... Done.
## Querying KEGG_2021_Human... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of BioPlanet 2019 database", enriched_pathway_brain[[1]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of WikiPathway 2021 Human database", enriched_pathway_brain[[2]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of KEGG 2021 Human database", enriched_pathway_brain[[3]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
if (websiteLive) {
enriched_pathway_liver <- enrichr(liver_total, dbs_pathway)
}
## Uploading data to Enrichr... Done.
## Querying BioPlanet_2019... Done.
## Querying WikiPathway_2021_Human... Done.
## Querying KEGG_2021_Human... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of BioPlanet 2023 database", enriched_pathway_liver[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of WikiPathway 2023 Human database", enriched_pathway_liver[[2]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of KEGG 2023 Human database", enriched_pathway_liver[[3]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
if (websiteLive) {
enriched_pathway_lung <- enrichr(lung_total, dbs_pathway)
}
## Uploading data to Enrichr... Done.
## Querying BioPlanet_2019... Done.
## Querying WikiPathway_2021_Human... Done.
## Querying KEGG_2021_Human... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of BioPlanet 2023 database", enriched_pathway_lung[[1]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of WikiPathway 2023 Human database", enriched_pathway_lung[[2]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
if (websiteLive) plotEnrich(title = "Enriched terms of KEGG 2023 Human database", enriched_pathway_lung[[3]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")
dbs_celltypes <- c("Human_Gene_Atlas")
if (websiteLive) {
enriched_celltypes_brain <- enrichr(brain_total, dbs_celltypes)
}
## Uploading data to Enrichr... Done.
## Querying Human_Gene_Atlas... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Brain - Human Gene Atlas database", enriched_celltypes_brain[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")
dbs_celltypes <- c("Human_Gene_Atlas")
if (websiteLive) {
enriched_celltypes_liver <- enrichr(liver_total, dbs_celltypes)
}
## Uploading data to Enrichr... Done.
## Querying Human_Gene_Atlas... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Liver - Human Gene Atlas database", enriched_celltypes_liver[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")
dbs_celltypes <- c("Human_Gene_Atlas")
if (websiteLive) {
enriched_celltypes_lung <- enrichr(lung_total, dbs_celltypes)
}
## Uploading data to Enrichr... Done.
## Querying Human_Gene_Atlas... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Lung - Human Gene Atlas database", enriched_celltypes_lung[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")
At the end I can conclude that each tissue is correctly identified by the up regulated genes. In addition this methodology is robust enough to identify differentially expressed genes and tissue even in raw data, without deleting pseudogenes, non canonical chromosome, rRNA genes…